Multidimensional Scaling

  1. Introduction and Settings

Imagine there are \(N\) objects (geographical locations, characteristics of people, etc.) In some cases, it is hard to depict each of these \(N\) objects explicitly with a vector \(\vec{y} \in \mathbb{R}^{N}\), however, we can measure the degree of “closeness” (proximity) between these \(N\) objects. In other words, we can obtain a \(N \times N\) matrix \(\Delta\), with \(\Delta_{rs}\) representing the distance/dissimilarity between objects \(r\) and \(s\). Assume \(\Delta_{rr}=0\) and \(\Delta_{sr} = \Delta_{rs}\). \(\Delta\) is often called a Proximity/Distance/Dissimilarity Matrix.

Given the Proximity Matrix \(\Delta\), MDS (Multi-dimensional Scaling) comes to place when it comes to recovering the original/latent individual configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N \in \mathbb{R}^{t}\).

The primary goal of MDS is to reconstruct the original configuration as close as possible, say, find points \(\tilde{y}_1, \tilde{y}_2, \dots, \tilde{y}_N \in \mathbb{R}^{t^{\prime}}\) s.t. the distance between \(\tilde{y}_r\) and \(\tilde{y}_s\) is close to \(\Delta_{rs}\). Sometimes, the true dimension \(t\) is unknown, so determining the number of dimensions is also a major problem to be solved.

MDS can also serve as a data visualization method. Usually, \(t^{\prime}\) is set to be either two or three, and the \(N\) points in the two-dimensional (or three-dimensional) plot can show an obvious sign of clustering, where points in a particular cluster are viewed as being “closer” to the other points in that cluster than to points in other clusters.

  1. Example

Here we use the sphere distances of multiple large cities as an example to illustrate the power of MDS in recovering the original expressions and visualizing them.

The eurodist R package provides road distance between 21 cities in Europe and 9 cities in the US. For the moment, we will focus on the 10 US cities with names and distances given in the following tables

Atlanta Chicago Denver Houston LosAngeles Miami NewYork SanFrancisco Seattle Washington.DC
Atlanta 0 587 1212 701 1936 604 748 2139 2182 543
Chicago 587 0 920 940 1745 1188 713 1858 1737 597
Denver 1212 920 0 879 831 1726 1631 949 1021 1494
Houston 701 940 879 0 1374 968 1420 1645 1891 1220
LosAngeles 1936 1745 831 1374 0 2339 2451 347 959 2300
Miami 604 1188 1726 968 2339 0 1092 2594 2734 923
NewYork 748 713 1631 1420 2451 1092 0 2571 2408 205
SanFrancisco 2139 1858 949 1645 347 2594 2571 0 678 2442
Seattle 2182 1737 1021 1891 959 2734 2408 678 0 2329
Washington.DC 543 597 1494 1220 2300 923 205 2442 2329 0

After conducting classical MDS (a method of MDS, details will be discussed in proceeding parts), we acquire the plot below. The plot complies to our geographical knowledge. Miami and Seattle are the farthest from our plot, and these two cities are actually thousands of miles away (Seattle in the north-west end of US mainland and Miami in the south-east corner). NY and D.C. are quite close on the plot, same for LA and San Francisco.

Actually, if we rotate the above plot 180 degrees, a US map appears. It is amazing!

Again, let’s try MDS on the 21 European cities. We make some flipping on the map to make it comply to the true European map.

Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen Geneva Gibraltar Hamburg Hook of Holland Lisbon Lyons Madrid Marseilles Milan Munich Paris Rome Stockholm Vienna
Athens 0 3313 2963 3175 3339 2762 3276 2610 4485 2977 3030 4532 2753 3949 2865 2282 2179 3000 817 3927 1991
Barcelona 3313 0 1318 1326 1294 1498 2218 803 1172 2018 1490 1305 645 636 521 1014 1365 1033 1460 2868 1802
Brussels 2963 1318 0 204 583 206 966 677 2256 597 172 2084 690 1558 1011 925 747 285 1511 1616 1175
Calais 3175 1326 204 0 460 409 1136 747 2224 714 330 2052 739 1550 1059 1077 977 280 1662 1786 1381
Cherbourg 3339 1294 583 460 0 785 1545 853 2047 1115 731 1827 789 1347 1101 1209 1160 340 1794 2196 1588
Cologne 2762 1498 206 409 785 0 760 1662 2436 460 269 2290 714 1764 1035 911 583 465 1497 1403 937
Copenhagen 3276 2218 966 1136 1545 760 0 1418 3196 460 269 2971 1458 2498 1778 1537 1104 1176 2050 650 1455
Geneva 2610 803 677 747 853 1662 1418 0 1975 1118 895 1936 158 1439 425 328 591 513 995 2068 1019
Gibraltar 4485 1172 2256 2224 2047 2436 3196 1975 0 2897 2428 676 1817 698 1693 2185 2565 1971 2631 3886 2974
Hamburg 2977 2018 597 714 1115 460 460 1118 2897 0 550 2671 1159 2198 1479 1238 805 877 1751 949 1155
Hook of Holland 3030 1490 172 330 731 269 269 895 2428 550 0 2280 863 1730 1183 1098 851 457 1683 1500 1205
Lisbon 4532 1305 2084 2052 1827 2290 2971 1936 676 2671 2280 0 1178 668 1762 2250 2507 1799 2700 3231 2937
Lyons 2753 645 690 739 789 714 1458 158 1817 1159 863 1178 0 1281 320 328 724 471 1048 2108 1157
Madrid 3949 636 1558 1550 1347 1764 2498 1439 698 2198 1730 668 1281 0 1157 1724 2010 1273 2097 3188 2409
Marseilles 2865 521 1011 1059 1101 1035 1778 425 1693 1479 1183 1762 320 1157 0 618 1109 792 1011 2428 1363
Milan 2282 1014 925 1077 1209 911 1537 328 2185 1238 1098 2250 328 1724 618 0 331 856 586 2187 898
Munich 2179 1365 747 977 1160 583 1104 591 2565 805 851 2507 724 2010 1109 331 0 821 946 1754 428
Paris 3000 1033 285 280 340 465 1176 513 1971 877 457 1799 471 1273 792 856 821 0 1476 1827 1249
Rome 817 1460 1511 1662 1794 1497 2050 995 2631 1751 1683 2700 1048 2097 1011 586 946 1476 0 2707 1209
Stockholm 3927 2868 1616 1786 2196 1403 650 2068 3886 949 1500 3231 2108 3188 2428 2187 1754 1827 2707 0 2105
Vienna 1991 1802 1175 1381 1588 937 1455 1019 2974 1155 1205 2937 1157 2409 1363 898 428 1249 1209 2105 0

The above plot reconstructs the European map quite well. Gibraltar, Lisbon and Madrid are in the south-west corner, the two North European cities Stockholm and Copenhagen are in the north end, and Athens is in the south-west corner.

Finally, let’s consider the 18 representative global cities. Their pairwise flight lengths (geodesic distance) are shown in the table below. As we can see, the geodesic distances between the three Southern-Hemisphere cities: Rio, Cape Town, Melbourne and other Northern-Hemisphere cities are generally large (almost all over 10,000 kilometers)

Beijing Cape Town Hong Kong Honolulu London Melbourne Mexico City Montreal Moscow New Delhi New York Paris Rio Rome S.F. Singapore Stockholm Tokyo
Beijing 0 12947 1972 8171 8160 9093 12478 10490 5809 2788 11012 8236 17325 8144 9524 4465 6725 2104
Cape Town 12947 0 11867 18562 9635 10388 13703 12744 10101 9284 12551 9307 6075 8417 16487 9671 10334 14737
Hong Kong 1972 11867 0 8945 9646 7392 14155 12462 7158 3770 12984 9650 17710 9300 11121 2575 8243 2893
Honolulu 8171 18562 8945 0 11653 8862 6098 7915 11342 11930 7996 11988 13343 12936 3857 10824 11059 6208
London 8160 9635 9646 11653 0 16902 8947 5240 2506 6724 5586 341 9254 1434 8640 10860 1436 9585
Melbourne 9093 10388 7392 8862 16902 0 13557 16730 14418 10192 16671 16793 13227 15987 12644 6050 15593 8159
Mexico City 12478 13703 14155 6098 8947 13557 0 3728 10740 14679 3362 9213 7669 10260 3038 16623 9603 11319
Montreal 10490 12744 12462 7915 5240 16730 3728 0 7077 11286 533 5522 8175 6601 4092 14816 5900 10409
Moscow 5809 10101 7158 11342 2506 14418 10740 7077 0 4349 7530 2492 11529 2378 9469 8426 1231 7502
New Delhi 2788 9284 3770 11930 6724 10192 14679 11286 4349 0 11779 6601 14080 5929 12380 4142 5579 5857
New York 11012 12551 12984 7996 5586 16671 3362 533 7530 11779 0 5851 7729 6907 4140 15349 6336 10870
Paris 8236 9307 9650 11988 341 16793 9213 5522 2492 6601 5851 0 9146 1108 8975 10743 1546 9738
Rio 17325 6075 17710 13343 9254 13227 7669 8175 11529 14080 7729 9146 0 9181 10647 15740 10682 18557
Rome 8144 8417 9300 12936 1434 15987 10260 6601 2378 5929 6907 1108 9181 0 10071 10030 1977 9881
S.F. 9524 16487 11121 3857 8640 12644 3038 4092 9469 12380 4140 8975 10647 10071 0 13598 8644 8284
Singapore 4465 9671 2575 10824 10860 6050 16623 14816 8426 4142 15349 10743 15740 10030 13598 0 9646 5317
Stockholm 6725 10334 8243 11059 1436 15593 9603 5900 1231 5579 6336 1546 10682 1977 8644 9646 0 8193
Tokyo 2104 14737 2893 6208 9585 8159 11319 10409 7502 5857 10870 9738 18557 9881 8284 5317 8193 0

The three-dimensional visualization result of MDS is shown above. You can rotate and magnify it on your laptop. The blue points represent Asian cities, the black points represent European cities, and the red points represent North American cities. They are clustered together within the group. If you try to rotate the plot and make these three clusters in the bottom, you will find the three separate cities: Rio, Melbourne and Cape Town on the top of the plot. This actually represents that they are the only cities in Southern-Hemisphere. If you inspect the plot clearly, you will find that these cities actually locate on a sphere, which complies to the true scenario. MDS is quite successful in recovering their relative locations!

  1. Key features of MDS

Besides showing the capacity of MDS in recovering original expressions and visualization, we also gain two by-products from the examples above.

  1. The pairwise distances/proximity between two objects don’t have to be Euclidean distance. In the above example, they are actually geodesic distance. Actually, based on the type of distance metric and the specific recovery method of the Proximity Matrix, MDS can be divided into three major types: Classical Scaling; Metric MDS; and Non-Metric MDS.

Classical Scaling and Metric MDS generally require that the input data is a true distance metric, while Non-metric MDS is usually used when the input data doesn’t satisfy the properties of a true distance metric or when the relationships are ordinal (i.e., we only know which distances are larger, but not by how much).

To add, Classical MDS operates by applying an eigen-decomposition to the “double-centered” dissimilarity matrix, while Metric MDS and Non-Metric MDS may employ iterative optimization methods to better accommodate non-linear relationships in data structure.

  1. As we can tell from the recovery of US map and European map, the configurations of \(\tilde{y}_1, \tilde{y}_2, \dots, \tilde{y}_N\) are not unique, as we can rotate or flip the map. Actually, if \(\tilde{y}_1, \tilde{y}_2, \dots, \tilde{y}_N \in \mathbb{R}^{t^{\prime}}\) is considered as the optimal solution, given any vector \(\vec{b} \in \mathbb{R}^{t^{\prime}}\) and orthogonal matrix \(A \in \mathbb{R}^{t^{\prime} \times t^{\prime}}\), \(||(A \tilde{y}_r + \vec{b} - (A \tilde{y}_s + \vec{b})|| = ||\tilde{y}_r - \tilde{y}_s||\). Rotation, Reflection or Translation don’t alter the pairwise distances. So \(A \tilde{y}_1 + \vec{b}, A \tilde{y}_2 + \vec{b}, \dots, A \tilde{y}_N + \vec{b}\) is also an optimal solution.

Classical Scaling

Before going deep into Classical Scaling, we first need to introduce some important definitions.

Definitions:

  1. A matrix \(\Delta \in \mathbb{R}^{N \times N}\) is called a distance matrix if it possesses the following properties:
  1. Symmetry

\(\Delta\) is symmetric, meaning that:

\[\Delta_{rs} = \Delta_{sr} \quad \text{for all } r \text{ and } s\]

This implies that the distance between points \(r\) and \(s\) is the same as the distance between points \(s\) and \(r\).

  1. Zero Diagonal

The diagonal entries of the matrix represent the distance of a point to itself, and are thus all zeros:

\[\Delta_{rr} \equiv 0 \quad \text{for all } 1 \leq r \leq N\]

  1. Non-negativity

All distances are non-negative:

\[\Delta_{rs} \geq 0\]

  1. Triangle Inequality

The matrix respects the triangle inequality:

\[\Delta_{rs} \leq \Delta_{rt} + \Delta_{ts}\]

This property ensures that the direct distance between two points \(r\) and \(s\) is never greater than the sum of the distances from \(r\) to a third point \(t\) and from \(t\) to \(s\).

  1. Further, the distance matrix \(\Delta \in \mathbb{R}^{N \times N}\) is a Euclidean distance matrix if there exists a configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\) s.t. \(\Delta_{rs}\) represents the Euclidean distance between points \(r\) and \(s\), i.e., \(||\vec{y}_r-\vec{y}_s||=\Delta_{rs} \forall r,s\).

For configuration \(\vec{y}_i, \vec{y}_j \in \mathbb{R}^t\), the Euclidean distance between point \(r\) and point \(s\) is defined as \(\delta_{ij}=||\vec{y}_i - \vec{y}_j||= \left\{\sum_{k=1}^t \left(y_{i k}-x_{j k}\right)^2\right\}^{1/2}\). Remember to take the square root

  1. Prerequisite for Classical Scaling:
  1. Strictly speaking, we require that the Proximity Matrix \(\Delta\) is an Euclidean distance matrix.

  2. Furthermore, it is assumed that the configurations \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\) are centered. In other words, \(\sum_{i=1}^N \vec{y}_i = \vec{0}\). This step eliminates the influence of location.

Recover Coordinates

It may sound difficult to recover the configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\) from merely the Euclidean distance matrix \(\Delta\) at the first glance. However, the situation becomes clearer after some analysis.

\(\Delta^2_{rs}\) can be expressed as \(\Delta^2_{rs} = ||\vec{y}_r||^2 + ||\vec{y}_s||^2 - 2 \vec{y}_r^T \vec{y}_s\). It’s not difficult to observe that both terms are both very easy to be expressed with the original configuration: \(Y = \left(\begin{array}{c} \vec{y}_1^{\top} \\ \vec{y}_2^{\top} \\ \vdots \\ \dot{y}_N^{\top} \end{array}\right)\).

The entries of the inner product matrix, i.e. \(B = YY^T \in \mathbb{R}^{N \times N}\) are able to express the Euclidean distance matrix \(\Delta\).

1) Computing the inner product matrix \(B\)

The inner product term can be represented as: \(B_{ij} = \vec{y}_r^T \vec{y}_s\), while the norms of \(\vec{y}_r\) and \(\vec{y}_s\) can be represented by the entry of \(B\), i.e. \(B_{ii} = ||\vec{y}_i||^2\). As a result, \(\Delta^2_{ij} = B_{ii} + B_{jj} - 2 B_{ij}\).

Key Observation Here, we successfully express the entries of \(\Delta\) (known) with entries of \(B\) (unknown). However, our goal is to find a way to express entries of \(B\) (unknown) with entries of \(\Delta\). Intuitively, we want to cancel out \(B_{ij}\) terms in the expression. Considering that \(\sum_{i=1}^N \vec{y}_i = \vec{0}\), we sum both sides of the equation over the index \(i\). We can get the following expression: (Because \(\sum_{i=1}^N B_{ij} = \vec{y}_j^T (\sum_{i=1}^N \vec{y}_i) = 0\)) \[\sum_{i=1}^N \Delta_{ij}^2=\operatorname{tr}(B) + N B_{ii}\]

Similarly, we can also sum both sides of the equation over index \(j\), and get the expression: \[\sum_{j=1}^N \Delta_{ij}^2=\operatorname{tr}(B) + N B_{jj}\]

We successfully eliminate all the off-diagonal terms of \(B\) through the above steps. Now, we want to take a step further. Sum both sides of the equations over both indexes \(i\) and \(j\). We acquire the following expression: \[\sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2 = 2N \operatorname{tr}(B)\]

Now we can solve the entries of \(\Delta\) using the entries of \(B\) through a backward calculation. From the last equation, we get \[\operatorname{tr}(B) = \frac{1}{2N} \sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2\]

Then substitute the above expression into above formulas, we get the expression of the diagonal entries of \(B\):

\[B_{ii} = \frac{1}{N} (\sum_{j=1}^{N} \Delta_{ij}^2 - \operatorname{tr} (B))\]

After that, we can finally get the off-diagonal entries of \(B\):

\[\begin{aligned} B_{ij} & = \frac{1}{2} (B_{ii} + B_{jj} - \Delta_{ij}^2) \\ & = -\frac{1}{2} \Delta_{ij}^2 + \frac{1}{N} \sum_{i=1}^N \Delta_{ij}^2 + \frac{1}{N} \sum_{j=1}^N \Delta_{ij}^2-\frac{1}{2 N^2} \sum_{i=1}^N \sum_{j=1}^N \Delta_{ij}^2 \end{aligned}\]

Actually, we may also express the inner product matrix \(B\) in a matrix form, which is what we do in real data computation. Here \(A \in \mathbb{R}^{N \times N}, A_{ij} = \Delta^2_{ij} \forall 1 \leq i,j \leq N\), \(H = \mathbb{I}_N - \frac{1}{N} \mathbb{1}_N \mathbb{1}_N^T\).

\[B = H A H\]

2) Recover the coordinates using inner product matrix \(B\)

Both diagonal and off-diagonal entries of the inner product matrix \(B\) has been shown. We assumed that \(B\) = \(YY^T\), so \(B\) is symmetric and positive semi-definite (all eigenvalues are non-negative), with \(t\) positive eigenvalues and \(N-t\) ‘zero’ eigenvalues.

Our intuition is to apply SVD to \(B\) in order to recover the configuration \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N \in \mathbb{R}^t\).

\[\begin{align} B & = \left(\vec{u}_1\left|\vec{u}_2\right| \ldots \mid \vec{u}_t\right) \begin{pmatrix} \lambda_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda_t \end{pmatrix} \left(\begin{array}{c} \vec{u}_1^{\top} \\ \vec{u}_2^{\top} \\ \vdots \\ \vec{u}_t^{\top} \end{array}\right) \\ & = \tilde{u} \begin{pmatrix} \lambda^{1/2}_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda^{1/2}_t \end{pmatrix} \begin{pmatrix} \lambda^{1/2}_1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & \lambda^{1/2}_t \end{pmatrix} \tilde{u}^T \\ & = (\tilde{u} \Lambda^{1/2}) (\tilde{u} \Lambda^{1/2})^T \end{align}\]

Let \(Y=\tilde{u} \Lambda^{1/2}\), use rows of \(\tilde{u} \Lambda^{1/2}\) as \(\vec{y}_1, \vec{y}_2, \dots, \vec{y}_N\). It satisfies every entry of the inner product matrix \(B\), as well as all pairwise distances \(\Delta_{ij}\).

Dealing with real data case

In real data case, sometimes the Euclidean condition is not met. As in the previous globe map example, the distance matrix actually computes the geodesic distance instead of the Euclidean distance. Under this circumstance, the “inner product” matrix \(B = HAH\) (we put a quotation mark here because \(B\) is no longer the inner product matrix) is not necssarily positive semi-definite.

Let’s put a numeric example here to illustrate this point.

Suppose we have a proximity matrix \(\Delta\):

Given the matrix: \[ \Delta = \begin{pmatrix} 0 & 1 & 3 \\ 1 & 0 & 1 \\ 3 & 1 & 0 \end{pmatrix} \]

##      [,1] [,2] [,3]
## [1,]    0    1    3
## [2,]    1    0    1
## [3,]    3    1    0

Obviously, matrix \(\Delta\) does not satisfy Triangle Inequality, so it’s not an Euclidean distance matrix (even not a distance matrix actually)

  1. Compute matrix \(A\) as: \[ A = -\frac{1}{2} \Delta^2 \]
##      [,1] [,2] [,3]
## [1,]  0.0 -0.5 -4.5
## [2,] -0.5  0.0 -0.5
## [3,] -4.5 -0.5  0.0
  1. Compute matrix \(B\) using: \[ H = I - \frac{1}{n} \mathbf{11}^T \] Where \(I\) is the identity matrix and \(n\) is the number of rows (or columns) in \(\Delta\). Then: \[ B = H A H \]
##            [,1]       [,2]       [,3]
## [1,]  2.1111111  0.2777778 -2.3888889
## [2,]  0.2777778 -0.5555556  0.2777778
## [3,] -2.3888889  0.2777778  2.1111111
  1. Finally, perform an eigen-decomposition on matrix \(B\).
## [1]  4.500000e+00 -7.771561e-16 -8.333333e-01
##               [,1]      [,2]       [,3]
## [1,]  7.071068e-01 0.5773503 -0.4082483
## [2,]  1.110223e-16 0.5773503  0.8164966
## [3,] -7.071068e-01 0.5773503 -0.4082483

Here, we have a negative eigenvalue \(-\frac{5}{6}\). This is because the original proximity matrix \(\Delta\) is not a distance matrix.

Handle Negative Eigenvalues

  1. non-symetric issue: There are cases where the proximity matrix \(\Delta\) is not symmetric (though it rarely happens in classical scaling scenario). Usually we set \(\Delta \leftarrow \Delta + \Delta^T\). In that way, we manually make \(\Delta\) symmetric.

  2. When there exist some negative eigenvalues in the inner product matrix \(B\), we usually have two options to deal with it.

  1. Inflate the original proximity matrix \(\Delta\) by a small constant factor \(c\), i.e., \(\Delta_{ij} \leftarrow \Delta_{ij} + c, \text{if} \quad i \neq j\). In this way, we can deal with the violence of Triangular Inequality.

  2. If there exist several negative eigenvalues with small absolute value (compared to the largest several positive eigenvalues), and there are more positive eigenvalues than our prior estimation (the dimension of the original configuration), we may just pick the largest \(t\) eigenvalues and eliminate the rest.

Global City Distance case revisit

We can also consider the previous global city distance matrix example. Plot the scree plot of the inner product matrix.

We find that the first three eigenvalues are much larger than the rest, so we assume that the dimension of the original configuration is 3, which also complies to our knowledge about global map.

Metric MDS

Nonmetric MDS